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Abstract 

The purpose of this paper is (i) to present a generic and fully functional implemen- 
tation of the density-matrix renormalization group (DMRG) algorithm, and (ii) to 
describe how to write additional strongly-correlated electron models and geometries 
by using templated classes. Besides considering general models and geometries, the 
code implements Hamiltonian symmetries in a generic way and parallelization over 
symmetry-related matrix blocks. 
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1 Introduction 



In many materials of technological interest strong interactions between the 
electrons lead to collective behavior. These systems, referred to as strongly cor- 
related electrons systems, display a broad range of important phenomena [2I], 
and their study is a major area of research in condensed matter physics. In 
this context, model Hamiltonians are used to simulate the relevant interac- 
tions of a given compound, and the relevant degrees of freedom. These studies 
rely on the use of tight-binding lattice models that consider electron localiza- 
tion, where states on one site can be labeled by spin and orbital degrees of 
freedom. Examples of these models include the Hubbard model j^, 0], the t-J 
model [5I, [oj and the spin 1/2 Heisenberg model, which can be considered the 
undoped limit of the t-J model. 

Non-perturbative methods to solve these fairly rich and complicated models 
include 0] (i) quantum Monte Carlo methods, and (ii) diagonalization meth- 
ods. These two paths to solve the problem are more or less complementary. 
Quantum Monte Carlo methods, being formulated in Matsubara frequency. 
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have difficulty obtaining real frequency properties of the model (such as the 
density-of-states), and sometimes suffer from the so-called "sign problem"!^. 
On the other hand, diagonalization methods usually work efficiently only at 
zero or low temperature, due to the high computational cost of obtaining a 
full spectrum. Indeed, for exact diagonalization methods, the Hilbert space 
over which the problem is formulated -and hence the size of the Hamiltonian 
matrix to be diagonalized- grows exponentially with the size of the system. 



In 1992, S. White [Ij introduced the density-matrix renormalization group 
(DMRG) method. The DMRG is a numerical variational technique to study 
quantum many body Hamiltonians that could be classified as a diagonalization 
method. For one-dimensional and quasi one-dimensional systems, this method 
is able to truncate, with bounded errors and in a general and efficient way, the 
underlying Hilbert space to a constant size. A full discussion of the DMRG 
is beyond the scope of the present paper, and I will only present a brief pro- 
cedural description of the method. Readers not familiar with the method are 
referred to the many published reviews j^, [lol, 11 1, and to the original paper 
1. 



The present paper and accompanying code can be used in different ways. 
Physicists will be able to immediately use the flexible input file to run the 
code (see Section 7) for the Hubbard model with inhomogeneous couplings, 
Hubbard U values, and on-site potentials, as well as different symmetries, ei- 
ther on one- dimensional chains or on n-leg ladders. Readers with knowledge 
of DMRG will be able to understand the motivation for abstraction in the im- 
plementation of the algorithm (Section 2). Readers with knowledge of C++ 
will be able to understand the overall implementation of the DMRG algo- 
rithm (Section 3), and write additional models (Section 4.1) and geometries 
(Section 4.2) by following the interface provided. Two models are included as 
examples: the Hubbard model and the spin 1/2 Heisenberg model, and two 
geometries: the one-dimensional chain and the n-leg ladder. Readers inter- 
ested in parallelization and performance issues (Section 4.3) will be able to 
write other concrete concurrency classes suited to their particular hardware 
requirements, following the code's parallelization abstract interface. Finally, 
conclusions are presented in Section 5. 



Other software projects, such as the ALPS project [12|, also implement the 
DMRG algorithm within their own frameworks. However, this paper and 
DMRG++ emphasize generic programming, strongly correlated electron sys- 
tems, detailed explanations, and few or no software dependencies. 



The rest of this section is dedicated to a brief overview of the DMRG method, 
and to introduce some conventions and notation used throughout the paper. 
Let us define block to mean a finite set of sites. Let C denote the states of a 
single site. This set is model dependent. For the Hubbard model it is given by: 
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□ □ , etc., 
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□ 



system 



environment 



Fig. 1. Labeling of blocks for the DMRG procedure. Blocks from vector of blocks 
X are added one at a time to block S to form the system and blocks from vector of 
blocks Y are added one at a time to E to form the environment. Blocks are vectors 
of integers. The integers (numbers at the top of the figure) label all sites in a fixed 
and unique way. 

C = {e, t, I, (t, i)}, where e is a formal element that denotes an empty state. 
For the t-J model it is given by C = {e, t, |}, and for the spin 1/2 Heisenberg 
model by C = {t,!}- A real-space-based Hilbert space V on a block B and 
set C is a Hilbert space with basis . I will simply denote this as V{B) and 
assume that C is implicit and fixed. A real-space-based Hilbert space can also 
be thought of as the external product space of ^B Hilbert spaces on a site, 
one for each site in block B. We will consider general Hamiltonians that act 
on Hilbert spaces V, as previously defined. 



I give a procedural description of the DMRG method in the following. We start 
with an initial block S (the initial system) and E (the initial environment). 
Consider two sets of blocks X and Y. We will be adding blocks from X to S, 
one at a time, and from Y to E, one at a time. Again, note that X and Y 
are sets of blocks whereas S and E are blocks. This is shown schematically in 
Fig. 1. All sites in S, X, Y and E are numbered as shown in the figure. Now 
we start a loop for the DMRG "infinite" algorithm by setting step = and 
Vr{S) = V{S) andVR{E) = V{E). 

The system is grown by adding the sites in Xstep to it, and let S' = S U Xstep, 
i.e. the step-th block of X to is added to form the block S'; likewise, let 
E' = E U Ystep- Let us form the following product Hilbert spaces: V(S") = 
VR{S)®V{X,tep) and V{E') = Vr{E)®V {Ystep) and their union V{S')®V{E') 
which is disjoint. 

Consider Hs'ue', the Hamiltonian operator, acting on V(>S") ® V{E'). We di- 
agonalize Hs'ue' (using Lanczos) to obtain its lowest eigenvector: 

l^)= E I/?), (1) 

aeV(5'),/3GV{£;') 

where is a basis of V(S") and is a basis of V{E'). 

Let us define the density matrices for system: 

{ps)a,a' = E C',/3^",/3 (2) 
fdeViE') 
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in V(S"), and environment: 



{Pe)p,P' = C,/3'^a,/3 (3) 

aeV{S') 



in V{E'). We then diagonahze ps, and obtain its eigenvalues and eigenvectors, 
in V(5") ordered in decreasing eigenvalue order. We change basis for the 
operator H^' (and other operators as necessary), as follows: 



We proceed in the same way for the environment, diagonalize p^; to obtain 
ordered eigenvectors w^, and define (^H^^™ basis^ 



/a, a' 



Let ms be a fixed number that corresponds to the number of states in V(S") 
that we want to keep. Consider the first ms eigenvectors , and let us call the 
Hilbert space spanned by them, Vr{S'), the DMRG-reduced Hilbert space on 
block S'. If ms > 7^V(S") then we keep all eigenvectors and there is effectively 
no truncation. We truncate the matrices {H^ "''^ basis^ (and other operators as 
necessary) such that they now act on this truncated Hilbert space, Vr{S'). 
We proceed in the same manner for the environment. 

Now we increase step by 1, set S ^ S', Vr{S) <— Vr{S'), Hs' <— i^s, and 
similarly for the environment, and continue with the growth phase of the 
algorithm. 

In the infinite algorithm, the number of sites in the system and environment 
grows as more steps are performed. After this infinite algorithm, a finite al- 
gorithm is applied where the environment is shrunk at the expense of the 
system, and the system is grown at the expense of the environment. During 
the finite algorithm phase the total number of sites remains constant allow- 
ing for a formulation of DMRG as a variational method in a basis of matrix 
product states. The advantage of the DMRG algorithm is that the trunca- 
tion procedure described above keeps the error bounded and small. Assume 



ms = mE = m. At each DMRG step[l3| the truncation error €tr = J2i>m 



where Aj are the eigenvalues of the truncated density matrix in decreasing 



order. The parameter m should be chosen such that etr remains small, say [13 
etr < 10~^. For critical ID systems etr decays as a function of m with a power 
law, while for ID system away from criticality it decays exponentially. For a 
more detailed description of the error introduced by the DMRG truncation in 



other systems see [9|, [lOl, lUl, [13 
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2 Motivation for Generic Programming 



Let us motivate the discussion by introducing a typical problem to be solved 
by DMRG: "Using tlie DMRG metliod, one would like to calculate the local 
density of states on all sites for a Hubbard model with inhomogeneous Hub- 
bard U values on a one-dimensional (ID) chain". We want to modularize as 
many tasks mentioned in the last sentence as possible. We certainly want to 
separate the DMRG solver from the model in question, since we could later 
want to do the same calculation for the t-J model; and the model from the 
lattice, since we might want to do the same calculation on, say, a n-leg lad- 
der, instead of a ID chain. C++ is a computer language that is very fit for 
this purpose, since it allows to template classes. Then we can write a C++ 
class to implement the DMRG method (DmrgSolver class), and template 
this class on a strongly-correlated-electron (SCE) model template, so that we 
can delegate all SCE model related code to the SCE model class. 

However, for DmrgSolver to be able to use a given SCE model, we need a 
convention that such SCE model class will have to follow. This is known as a 
C++ public interface, and for a SCE model it is given in DmrgModelBase. 
To do the calculation for a new SCE model, we simply need to implement 
all functions found in DmrgModelBase without changing the DmrgSolver 
class. The model will, in turn, be templated on the geometry. For example, the 
Hubbard model with inhomogeneous Hubbard U values and inhomogeneous 
hoppings (class DmrgModelHubbard) delegates all geometry related oper- 
ations to a templated geometry class. Then DmrgModelHubbard can be 
used for, say, one-dimensional chains and n-leg ladders without modification. 
This is done by just instantiating DmrgModelHubbard with the appropri- 
ate geometry class, either DmrgGeometryOneD or DmrgGeometryLadder, 
or some other class that the reader may wish to write, which implements the 
interface given in DmrgGeometryBase. 

In the following sections I will describe these different modules. Since the 
reader may wish to first understand how the DMRG method is implemented, 
I will start with the core C++ classes that implement the method. The user 
of the program will not need to change these core classes to add functionality. 
Instead, new models and geometries can be added by creating implementa- 
tions for DmrgModelBase and DmrgGeometryBase, and those public 
interfaces will be explained next. 

But for now I end this section by briefly describing the "driver" program 
for a Hubbard model on a ID chain. The driver program is contained in the 
file main.cpp. This file is created by the configure.pl script after answering 
questions related to model and geometry (see also Section 7). There, Dm- 
rgSolver is instantiated with DmrgModelHubbard, since in this case one 
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wishes to perform the calculation for the Hubbard model. In turn, Dmrg- 
ModelHubbard is instantiated with DmrgGeometryOneD since now one 
wishes to perform the calculation on a ID chain. 



3 Core Classes: The DMRG Solver and Bases 

3.1 DmrgSolver and The "Infinite" DMRG Algorithm 

The purpose of the DmrgSolver class is to perform the loop for the DMRG 
"infinite algorithm" discussed before. This class also performs the "finite algo- 
rithm" to allow for the calculation of observables, such as the local density 
of states of the clustei[I], defined as 

/oo -J. - 

dt{ij\e-'"'cle'"'cie'^'\^P), (5) 
-oo 

where \ifj) is the ground state of the system. The program is structured as a 
series of header files containing the implementatioiH] with each class written 
in the header file of the same name, and a "driver" program that uses the 
capabilities provided by the header files to solve a specific problem. To simplify 
the discussion, we start where the "driver program" starts, in its int main() 
function, which calls dmrgSolver.main(), whose main work is to perform the 
loop for the "infinite" DMRG algorithm. Let us now discuss this loop which 
is found in the infiniteDmrgLoop function, and is sketched in Fig. 2. 

In Fig. 2(a) the system pS is grown by adding the sites contained in block 
X[step]. Note that X is a vector of blocks to be added one at a tim^. The 
block X[step] (usually just a single site) is added to the right of pS, hence 
the GROW -RIGHT flag. The result is stored in pSprime. Similarly is done in 
Fig. 2(b) for the environment: the block Y[step] (usually just a single site) is 
added to the environment given in pE and stored in pEprime. This time the 
addition is done to the left of pE, since pE is the environment. In Fig. 2(c) 
the outer product of pSprime (the new system) and pEprime (the new en- 
vironment) is made and stored in pSE. The actual task is delegated to the 
DmrgBasis class (see Section 3.2). In Fig. 2(d) the diagonalization of the 

^ In general one would want to calculate the Green function Gij{u)) and this ob- 
servable can be implemented in a similar way. 

^ Traditionally, implementation is written in cpp files that are compiled separately. 
However, here templates are used heavily, and to avoid complications related to 
templates that some C-I-+ compilers cannot handle, we choose to have only one 
translation unit. 

^ So X is a vector of vector of integers, and X[step] is a vector of integers. 
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for (step=0;step <X . size {); step -\ — h) { 
// grow system (a) 

grow{ pSprime ,pS ,X[ step ] , model ,GROW_raGHT) ; 
// grow environment ( b ) 

grow( pEprime ,pE,Y[ step ] , model ,GROW_LEFT; 
// product of system, and environment ( c ) 
pSE . setToProduct (pSprime , pEprime) ; 

diagonalize { psi , pSprime , pEprime , pSE , 

model ) ; // (d) 
ns— pSprime . s i z e ( ) ; 
n e— p Eprime. size () ; 

changeAndTruncateBasis{pS , psi , pSprime 

pSE.O); // (e) 
changeAndTruncateBasis{ pE , p s i , pEprime 

pSE,l); // (f) 

system Stack . push(pS) ; // ( g ) 

} 



Fig. 2. Implementation of the "infinite" DMRG loop for a general SCE model on a 
general geometry. 

Hamiltonian for block pSE is performed, and the ground state vector is com- 
puted and stored in psi, following Eq. (1). Next, in Fig. 2(e) the bases are 
changed following Eqs. (2,3,4), truncated if necessary, and the result is stored 
in pS for the system, and in pE, Fig. 2(f), for the environment. Note that this 
overwrites the old pS and pE, preparing these variable for the next DMRG 
step. 

A copy of the current state of the system is pushed into a last-in-first-out 
stack in Fig. 2(g), so that it can later be used in the finite DMRG algorithm 
(not discussed here, see code). The loop continues until all blocks in vector 
of blocks X have been added to the initial system S, and all blocks in vector 
of blocks Y have been added to the initial environment E. We repeat again 
that vector of sites are used instead of simply sites to generalize the growth 
process, in case one might want to add more than one site at a time. 

I will later go back to this infinite DMRG loop and discuss the implementation 
of the steps mentioned in the previous paragraph (i.e., growth process, outer 
products, diagonalization, change of basis and truncation). However, some of 
these capabilities need first the introduction of two new C-I--I- classes to handle 
operations related to Hilbert space bases. 



, ns , ne , 
, ns , ne , 



3.2 DmrgBasis Class: Implementation of Symmetries 



DMRG-|--|- has two C++ classes that handle the concept of a basis (of a 
Hilbert space). The first one (DmrgBasis) handles reordering and symmetries 
in a general way, without the need to consider operators. The second one 
(DmrgBasisWithOperators) does consider operators, and will be explained 
in the next sub-section. The advantage of dividing functionality in this way 
will become apparent later. 
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In any actual computer simulation the "infinite" DMRG loop will actually 
stop at a certain point. Let us say that it stops after 50 sites have been 
added to the syste n0. There will also be at this point another 50 sites that 
constitute the environment. Now, from the beginning each of these 100 sites 
is given a fixed number from to 99. Therefore, sites are always labeled in a 
fixed way and their labels are always known (see Fig. 1). The variable block_ 
of a DmrgBasis object indicates over which sites the basis represented by 
this object is being built. To explain the rest of the capability handled by 
the DmrgBasis class, I need to explain how symmetries are treated in the 
program, and how the Hilbert space basis is partitioned. This is explained in 
the following. 

Symmetries will allow the solver to block the Hamiltonian matrix in blocks, 
using less memory, speeding up the computation and allowing the code to par- 
allelize matrix blocks related by symmetry. Let us assume that our particular 
model has Ns symmetries labeled by < a < A^^. Therefore, each element k 
of the basis has Ng associated "good" quantum numbers qk^a- These quantum 
numbers can refer to practically anything, e.g., to number of particles with a 
given spin or orbital or to the z component of the spin. We do not need to 
know the details to block the matrix. However, we know that these numbers 
are finite, and let Q be an integer such that qk,a < Q Wk,a. We can then com- 
bine all these quantum numbers into a single one, like this: Qk = J2aQk,aQ°', 
and this mapping is bijective. In essence, we combined all "good" quantum 
numbers into a single one and from now on we will consider that we have only 
one Hamiltonian symmetry called the "effective" symmetry, and only one cor- 
responding number qk, the "effective" quantum number. These numbers are 
stored in the member quantumNumhers of C-|--|- class DmrgBasisImplemen- 
tation. (Note that if one has 100 sites or lessO] then the number Q defined 
above is probably of the order of hundreds for usual symmetries, making this 
implementation very practical for systems of correlated electrons.) 

We then reorder our basis such that its elements are given in increasing q 
number. There will be a permutation vector associated with this reordering, 
that will be stored in the member permutation Vector of class DmrgBasisIm- 
plementation. 

What remains to be done is to find a partition of the basis which labels 
where the quantum number changes. Let us say that the quantum numbers 
of the reordered basis states are {3, 3, 3, 3, 8, 8, 9, 9, 9, 15, ■ ■ ■ }. Then we define 
a vector named "partition", such that partition [0]=0, partition[l]=4, because 

^ For simplicity, this explanatory text considers the case of blocks having a single 
site, so one site is added at a time, but a more general case can be handled by the 
code. 

^ This is probably a maximum for systems of correlated electrons such as the Hub- 
bard model or the t-J model. 
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the quantum number changes in the 4th position (from 3 to 8), and then 
partition [2] =6, because the quantum number changes again (from 8 to 9) 
in the 6th position, etc. Now we know that our Hamiltonian matrix will be 
composed first of a block of 4x4, then of a block of 2x2, etc. 

The quantum numbers of the original (untransformed) real-space basis are 
set by the model class (to be described in Section 4.1), whereas the quantum 
numbers of outer products are handled by the class DmrgBasis. This can 
be done because if \a) has quantum number and \b) has quantum number 
Qb, thei{3 |a) ® 1^) has quantum number Qa + qt- DmrgBasis also knows how 
quantum numbers change when we change the basis: they do not change since 
the DMRG transformation preserves quantum numbers; and DmrgBasis also 
knows what happens to quantum numbers when we truncate the basis: quan- 
tum numbers of discarded states are discarded. In this way, symmetries are im- 
plemented efficiently, with minimal dependencies and in a model-independent 
way. 

3.3 DmrgBasisWithOperators Class and Outer Product of Operators 

C++ class DmrgBasis implements only certain functionality associated with 
a Hilbert space basis, as mentioned in the previous section. However, more 
capabilities related to a Hilbert space basis are needed. 

C++ class DmrgBasisWithOperators inherits from DmrgBasis, and con- 
tains certain local operators for the basis in question, as well as the Hamilto- 
nian matrix. The operators that need to be considered here are operators nec- 
essary to compute the Hamiltonian across the system and environment, and 
to compute observables. Therefore, the specific operators vary from model 
to model. For example, for the Hubbard model, we consider Cjg^ operators, 
that destroy an electron with spin a on site i. For the spin 1/2 Heisenberg 
model, we consider operators and for each site i. In each case these 
operators are calculated by the model class (see Section 4.1) on the "natu- 
ral" basis, and added to the basis in question with a call to setOperators(). 
These local operators are stored as sparse matrices to save memory, although 
the matrix type is templated and could be anything. For details on the im- 
plementation of these operators, see OperatorsBase and the two examples 
provided OperatorsHubbard and OperatorsHeisenberg for the Hubbard 
and Heisenberg models, respectively. Additionally, DmrgBasisWithOperators 
has a number of member functions to handle operations that the DMRG 
method performs on local operators in a Hilbert space basis. These include 
functions to create an outer product of two given Hilbert spaces, to transform 
a basis, to truncate a basis, etc. 

^ Local symmetries must be assumed here. 
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Let us now go back to the "infinite" DMRG loop and discuss in more detail 
Fig. 2(a) ((b) is similar)), i.e., the function grow(), which is found in Dmrg- 
Solver. Local operators are set for the basis in question with a call to Dmrg- 
BasisWithOperators's member function setOperators(). When adding sites 
to the system or environment the program does a full outer product, i.e., it 
increases the size of all local operators. This is performed by the call to set- 
ToProduct(pSprime,pS,Xhasis,dir, option) in the grow function, which actually 
calls pSprime.setToProduct(pS,xBasis,dir). This function also recalculates the 
Hamiltonian in the outer product of (i) the previous system basis pS, and (ii) 
the basis Xhasis corresponding to the site(s) that is (are) being added. To 
do this, the Hamiltonian connection between the two parts needs to be calcu- 
lated and added, and this is done in the call to addHamiltonianConnection, 
found in the function grow(). Finally, the resulting dmrgBasis object for the 
outer product, pSprime, is set to contain this full Hamiltonian with the call 
to pSprime.setHarniltonian(matrix) . 

I will now explain how the full outer product between two operators is imple- 
mented. If local operator A lives in Hilbert space A and local operator B lives 
in Hilbert space B, then C = AB lives in Hilbert space C = A^ B. Let ai 
and a2 represent states of A, and let /3i and P2 represent states of B. Then, 
in the product basis, Cai,i3i-a2,i32 = ^ai,a2Bpi,i32- Additionally, C is reordered 
such that each state of this outer product basis is labeled in increasing effec- 
tive quantum number (see Section 3.2). In the previous example, if the Hilbert 
spaces A and B had sizes a and 6, respectively, then their outer product would 
have size ah. When we add sites to the system (or the environment) the mem- 
ory usage remains bounded by the truncation, and it is usually not a problem 
to store full product matrices, as long as we do it in a sparse way (DMRG++ 
uses compressed row storage). In short, local operators are always stored in 
the most recently transformed basis for all sites and, if applicable, all values 
of the internal degree of freedom a. 

This simplifies the implementation, but it must be remembered that only 
the local operators corresponding to the most recently added sites will be 
meaningful. Indeed, if we apply transformation W (possibly truncating the 
basis, see Eq. (4)) then 

{W^AW){W^BW) ^ W\AB)W, (6) 

since WW^ 7^ 1 because the DMRG truncation does not assure us that VF^ will 
be the right inverse of W (but W^W = 1 always holds). Because of this reason 
we cannot construct the Hamiltonian simply from transformed local operators, 
even if we store them for all sites, but we need to store also the Hamiltonian 
in the most recently transformed basi^. The fact that DmrgBasis With- 

Other observables do not suffer from this problem, because they need only be 
computed during the finite algorithm phase, when WW^ = 1 holds within trunca- 
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Operators stores local operators in the most recently transformed basis for 
all sites does not increase memory usage too much, and simplifies the writing 
of code for complicated geometries or connections. The SCE model class is 
responsible for determining whether a transformed operator can be used (or 
not because of the reason mentioned above). 

Let us now examine in more detail Fig. 2(c), where we form the outer product 
of the current system and current environment, and calculate its Hamiltonian. 
We could use the same procedure as outlined in the previous paragraph, i.e., 
to use the DmrgBasisWithOperators class to resize the matrices for all lo- 
cal operators. Storing matrices in this case (even in a sparse way and even 
considering that there is truncation) would be too much of a penalty for per- 
formance. Therefore, in this latter case we do the outer product on-the-fly 
only, without storing any matrices. In Fig. 2(c) pSE contains the outer prod- 
uct of system and environment, but pSE is only a DmrgBasis object, not a 
DmrgBasisWithOperators object, i.e., it does not contain operators. 



We now consider Fig. 2(d), where the diagonalization of the system's plus en- 
vironment's Hamiltonian is performed. Since pSE, being only a DmrgBasis 
object, does not contain all the information related to the outer product of 
system and environment (as we saw, this would be prohibitively expensive), 
we need to pass the system's basis (pSprime) and the environment's basis 
(pEprime) to the diagonalization function {diagonalize() in DmrgSolver) in 
order to be able to form the outer product on-the-fly. There, since pSE does 
provide information about effective symmetry blocking, we block the Hamilto- 
nian matrix using effective symmetry, and call diagonaliseOneBlock() in Dm- 
rgSolver for each symmetry block. Only those matrix blocks that contain 
the desired or targeted number of electrons will be considered. To diagonal- 



ize Hamiltonian H we use the Lanczos methodjl4j. |15[ |. although this is also 
templated. 



For the Lanczos diagonalization method we also want to provide as much 
code isolation and modularity as possible. The Lanczos method needs only to 
know how to perform the operation x += Hy, given vectors x and y. Using 
this fact, we can separate the matrix type from the Lanczos method. To keep 
the discussion short this is not addressed here, but can be seen in the diag- 
onaliseOneBlock() function, and in classes SolverLanczos, Hamiltonian- 
InternalProduct, and DmrgModelHelper. The first of these classes con- 
tains an implementation of the Lanczos method that is templated on a class 
that simply has to provide the operation x += B.y and, therefore, it is generic 
and valid for any SCE model. It is important to remark that the operation 
X += B.y is finally delegated to the model in question. As an example, the op- 
eration X -|-= H for the Hubbard model is performed in function matrixVector- 



tion error. 
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Product() in class DmrgModelHubbard. This function performs only three 

tasks, (i) X -\- Hsystem?/; (ii) X + Henvironmenty tUld (hi) X -\- Hconnectiony ■ 

The fist two are straightforward, so we focus on the last one, in hamiltonian- 
ConnectionProduct(), that considers the part of the Hamiltonian that connects 
system and environment. This function runs the following loop: for every site 
i in the system and every site j in the environment it calculates x += Hij-y 
in function linkProduct, after finding the appropriate tight binding hopping 
value. 

The function linkProduct is useful not only for the Hubbard model, but it is 
generic enough to use in other SCE models that include a tight binding con- 
nection of the type c|^Cjo-, and, therefore, is part of a separate class, Connec- 
torHopping. Likewise, the function linkProduct in ConnectorExchange 
deals with Hamiltonian connections of the type Si ■ Sj, and can be used by 
models that include that type of term, such as the sample spin 1/2 Heisenberg 
model provided with DMRG++. We remind readers that wish to understand 
this code that the function linkProduct and, in particular, the related function 
fastOpProdlnter are more complicated than usual, since (i) the outer product 
is constructed on the fiy, and (ii) the resulting states of this outer product 
need to be reordered so that effective symmetry blocking can be used. 



4 Abstract Classes 

4-1 The Model Interface 

A sample SCE model, the one-band Hubbard model, 

tijcl^Cja + UiHi^Uii + Y ViUi^, 

is implemented in class DmrgModelHubbard. A sample DmrgModel- 
Heisenberg is also included for the spin 1/2 Heisenberg model JijSi ■ Sj. 
These models inherit from the abstract class DmrgModelBase. To imple- 
ment other SCE models one has to implement the functions prototyped in 
that abstract class. The interface (functions in DmrgModelBase) are docu- 
mented in place; here 1 briefly describe some of them. The matrixVectorProduct 
function needs to implement the operation x += H?/. The function addHamil- 
tonianConnection implements the Hamiltonian connection (e.g. tight-binding 
links in the case of the Hubbard Model or products S-i ■ Sj in the case of 
the spin 1/2 Heisenberg model) between two basis, basis2 and basisS, in the 
order of the outer product, basisl = SymmetryOrdering(6aszs2 ® basisS). 
This was explained before in Section 3.3, and the examples shown by Dmrg- 
ModelHubbard and DmrgModelHeisenberg will be helpful in the imple- 
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mentation of other models. Function setNaturalBasis sets certain aspects of 
the "natural basis" (usually the real-space basis) on a given block. The oper- 
ator matrices (e.g., for the Hubbard model or 5*,^ and S'f for the spin 1/2 
Heisenberg model) need to be set there, as well as the Hamiltonian and the 
effective quantum number for each state of this natural basis. To implement 
the algorithm for a fixed density, the number of electrons for each state is also 
needed . 



4-2 The Geometry Interface 



I present two sample geometries, one for ID chains and one for n-leg ladders in 
classes DmrgGeometryOneD and DmrgGeometryLadder. Both derive 
from the abstract class DmrgGeometryBase. To implement new geometries 
a new class needs to be derived from this base class, and the functions in 
the base class (the interface) needs to be implemented. As in the case of 
DmrgModelBase, the interface is documented in the code, but here I briefly 
describe the most important functions. 

The function setBlocksOfSites needs to set the initial block for system and 
environment, and for the vector of blocks X and Y to be added to system 
and environment, respectively, according to the convention given in Fig. 1. 
There are two calcConnectorType functions. Both calculate the type of con- 
nection between two sites i and j, which can be SystemSystem, SystemEnvi- 
ron, EnvironSystem or EnvironEnviron, where the names are self-explanatory. 
The function calcConnectorValue determines the value of the connector (e.g., 
tight-binding hopping for the Hubbard model or Jij for the case of the spin 
1/2 Heisenberg model) between two sites, delegating the work to the model 
class if necessary. The function findExtremes determines the extremes sites of 
a given block of sites and the function findReflection finds the "reflection" in 
the environment block of a given site in the system block or vice-versa. 



4-3 The Concurrency Interface: Code Parallelization 



The Concurrency class encapsulates parallelization. Two concrete classes 
that implement this interface are included in the present code. One is for 
serial code (ConcurrencySerial class) that does no parallelization at all, 
and the other one ( Concur rencyMpi class) is for parallelization based on 
the MP10. Other parallelization implementations, e.g. using pthreads, can be 
similarly written by implementing this interface. The interface is described 
in place in class Concurrency. Here, I briefly mention its most important 



See, for example, http://www-unix.mcs.anl.gov/mpi/ 
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Fig. 3. Ground-State Density-Of-States of the Hubbard Model on a 32 site chain 
with a constant Hubbard U = 1.0 and density 1.0. 



functions. Function rank() returns the rank of the current processor or thread. 
nprocs() returns the total number of processors. Functions loopCreate() and 
loop() handle a parallelization of a standard loop. Function gather() gathers 
data from each processor into the root processor. 



5 Conclusions 



This paper presents DMRG-I--I-, a code to calculate properties of strongly cor- 
related electron models with the DMRG method. The paper explains how to 
use the code for the Hubbard and spin 1/2 Heisenberg models on a one dimen- 
sional chain and on n-leg ladders, and how to add new models and geometries 
through the use of public interfaces. The rationale behind the design of the 
generic DMRG algorithm is also explained, as well as the implications for 
memory usage and performance. The ideas used in the code -and explained 
in the paper- regarding symmetry blocking, treatment of Hamiltonian con- 
nections and parallelization, can be of inspiration to other researchers. The 
code implements two efficiency techniques (suggested originally in [3]): (i) 
the wave function transformation which transforms the wave function from 
the previous step to the current step to use as the initial vector for the Lanc- 
zos solver, and (ii) the use of different truncation values "m" for different finite 
size loops. Other efficiency improvements will be added to the present code 
in the future, for example, the use of "disk stacks" instead of memory stacks 
(std:: stack will be replaced by a DiskStack class), and the implementation of 
the reflection symmetry for the inflnite size algorithm (implying a factor of 2 
gain during the inflnite size algorithm phase). Future work will also include a 
systematic treatment of "correlation" observables of the type (OjOj), which 
can be addressed in a generic way. 
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7 Test Run 



(1) Download source code from here http : //www. ornl . gov/~gzl/dmrgPlusPlus/ 
(a stable version will be published by Computer Physics Communica- 
tions) 

(2) Create a sample driver code (main.cpp file) for the program by executing: 
perl configure.pl 

and answering the questions regarding model and geometry. Defaults val- 
ues can be chosen by pressing enter. This will also create a Makefile and 
sample input file. 

(3) Run make to compile and link the code. The LAPACK library is required 
by the program. File INSTALL contains more details on compilation. 

(4) Optionally, edit the sample input input. inp to adjust the parameters of 
the run. This file is self-explanatory. 

(5) Run the code with ./dmrg input . inp > output.dat 

(6) Notel: Progress is written to standard output. Energies and continued- 
fraction data is written to the file specified in input. inp: 
parameters . number0fKeptStates=64 

parameters . linSize=8 
parameters . density=l 

(other echo of input omitted) 
#Energy=-3 . 57537 
#Energy=-5 . 62889 
#Energy=-7 . 69483 
#Energy=-9 . 76627 

(7) Note2: To obtain local density of states data (such as Fig. 3): (i) run the 
code with the option calculateLDOS (see file README for a description 
of the options line in the input file) and (ii) process the continued fraction 
data as follows: 

perl contfraction.pl data.txt 16 -4 4 0.01 > figure. dos 

where [-4,4] is the energy interval over which the local density of states 
calculation is to be performed and 0.01 is the energy step increment. 
The method used to compute the density of states data is the continued- 
fraction method l3l- Other methods, such as the correction vector method 18 



have been proposed (see [10|] for a detailed review). 
(8) Note3: A detailed explanation of compilation instructions and required 
software can be found in the file INSTALL. A detailed explanation of 
input and output can be found in the file README. 
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